Goto

Collaborating Authors

 radar pin


Radar Camera Fusion via Representation Learning in Autonomous Driving

arXiv.org Artificial Intelligence

Radars and cameras are mature, cost-effective, and robust sensors and have been widely used in the perception stack of mass-produced autonomous driving systems. Due to their complementary properties, outputs from radar detection (radar pins) and camera perception (2D bounding boxes) are usually fused to generate the best perception results. The key to successful radar-camera fusion is accurate data association. The challenges in radar-camera association can be attributed to the complexity of driving scenes, the noisy and sparse nature of radar measurements, and the depth ambiguity from 2D bounding boxes. Traditional rule-based association methods are susceptible to performance degradation in challenging scenarios and failure in corner cases. In this study, we propose to address rad-cam association via deep representation learning, to explore feature-level interaction and global reasoning. Concretely, we design a loss sampling mechanism and an innovative ordinal loss to overcome the difficulty of imperfect labeling and to enforce critical human reasoning. Despite being trained with noisy labels generated by a rule-based algorithm, our proposed method achieves a performance of 92.2% F1 score, which is 11.6% higher than the rule-based teacher. Moreover, this data-driven method also lends itself to continuous improvement via corner case mining.


Convolutional Neural Networks With Heterogeneous Metadata

#artificialintelligence

In autonomous driving, convolutional neural networks are the go-to tool for various perception tasks. Although CNNs are great at distilling information from camera images (or a sequence of them in form of a video clip), I constantly bump into all kinds of metadata that do not lend themselves to convolutional neural networks. Metadata, by traditional definition, means a set of data used to describe other data. All these properties make it hard for CNN to consume the metadata directly as CNN assumes a data representation on a regular-spaced grid, and neighboring data on the grid has a closer spatial or semantic relationship as well. One special case is lidar point cloud data.